-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Segfault due to float_precision='round_trip' #15148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 85.54% (diff: 100%)
|
@@ -384,3 +384,5 @@ Bug Fixes | |||
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`) | |||
|
|||
- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`) | |||
|
|||
- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed on Python 2.7 or later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number; remove Python 2.7 or later (we in fact don't support < 2.7)
@@ -1774,7 +1774,9 @@ double precise_xstrtod(const char *str, char **endptr, char decimal, char sci, | |||
double round_trip(const char *p, char **q, char decimal, char sci, char tsep, | |||
int skip_trailing) { | |||
#if PY_VERSION_HEX >= 0x02070000 | |||
return PyOS_string_to_double(p, q, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact the entire ifdef can be removed here, we only support >= 2.7
@@ -483,10 +484,13 @@ cdef class TextReader: | |||
self.verbose = verbose | |||
self.low_memory = low_memory | |||
self.parser.converter = xstrtod | |||
self.parser.converter_nogil = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we are at it, can you rename self.parser.converter
-> self.parser.double_converter
(and same with convert_nogil
-> double_converter_nogil
@@ -1699,7 +1703,12 @@ cdef _try_double(parser_t *parser, int col, int line_start, int line_end, | |||
result = np.empty(lines, dtype=np.float64) | |||
data = <double *> result.data | |||
na_fset = kset_float64_from_list(na_flist) | |||
with nogil: | |||
if parser.converter_nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a 1-line comment here, as the naming convention is confusing
@@ -1774,7 +1774,9 @@ double precise_xstrtod(const char *str, char **endptr, char decimal, char sci, | |||
double round_trip(const char *p, char **q, char decimal, char sci, char tsep, | |||
int skip_trailing) { | |||
#if PY_VERSION_HEX >= 0x02070000 | |||
return PyOS_string_to_double(p, q, 0); | |||
double r = PyOS_string_to_double(p, q, 0); | |||
PyErr_Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the test cover the error condition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. An exception would otherwise occur in test_float_precision_round_trip_with_text
.
Updated. |
@@ -482,11 +485,14 @@ cdef class TextReader: | |||
|
|||
self.verbose = verbose | |||
self.low_memory = low_memory | |||
self.parser.converter = xstrtod | |||
self.parser.double_converter = xstrtod | |||
self.parser.double_converter_nogil = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of an indicator variable, why don't we do this
self.parser.double_converter_nogil = xstrtod
self.parser.double_converter_withgil = None
if float_precision == ..:
....
elif float_precsion == 'round_trip':
self.parser.double_converter_nogil = None
self.parser.double_converter_withgil = round_trip
Then the function pointer itself is descriptive. same effect, just easier to read (and you don't have to know what the boolean is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
round_trip calls back into Python, so the GIL must be held. It also fails to silence the Python exception, leading to spurious errors. Closes #15140.
lgtm |
thanks @Rufflewind |
`round_trip` calls back into Python, so the GIL must be held. It also fails to silence the Python exception, leading to spurious errors. Closes pandas-dev#15140. Author: Phil Ruffwind <rf@rufflewind.com> Closes pandas-dev#15148 from Rufflewind/master and squashes the following commits: c513d2e [Phil Ruffwind] BUG: Segfault due to float_precision='round_trip'
round_trip
calls back into Python, so the GIL must be held. It also fails to silence the Python exception, leading to spurious errors.Closes #15140.
git diff upstream/master | flake8 --diff
(Most of them are false positives because flake8 doesn't understand Cython or it was code I didn't even touch)